Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 17 de 17
Filter
1.
iScience ; 27(4): 109584, 2024 Apr 19.
Article in English | MEDLINE | ID: mdl-38623337

ABSTRACT

Peptidyl arginine deiminases (PADIs) catalyze protein citrullination, a post-translational conversion of arginine to citrulline. The most widely expressed member of this family, PADI2, regulates cellular processes that impact several diseases. We hypothesized that we could gain new insights into PADI2 function through a systematic evolutionary and structural analysis. Here, we identify 20 positively selected PADI2 residues, 16 of which are structurally exposed and maintain PADI2 interactions with cognate proteins. Many of these selected residues reside in non-catalytic regions of PADI2. We validate the importance of a prominent loop in the middle domain that encompasses PADI2 L162, a residue under positive selection. This site is essential for interaction with the transcription elongation factor (P-TEFb) and mediates the active transcription of the oncogenes c-MYC, and CCNB1, as well as impacting cellular proliferation. These insights could be key to understanding and addressing the role of the PADI2 c-MYC axis in cancer progression.

2.
Genes (Basel) ; 14(4)2023 03 28.
Article in English | MEDLINE | ID: mdl-37107571

ABSTRACT

Neurological disorders (ND) are diseases that affect the brain and the central and autonomic nervous systems, such as neurodevelopmental disorders, cerebellar ataxias, Parkinson's disease, or epilepsies. Nowadays, recommendations of the American College of Medical Genetics and Genomics strongly recommend applying next generation sequencing (NGS) as a first-line test in patients with these disorders. Whole exome sequencing (WES) is widely regarded as the current technology of choice for diagnosing monogenic ND. The introduction of NGS allows for rapid and inexpensive large-scale genomic analysis and has led to enormous progress in deciphering monogenic forms of various genetic diseases. The simultaneous analysis of several potentially mutated genes improves the diagnostic process, making it faster and more efficient. The main aim of this report is to discuss the impact and advantages of the implementation of WES into the clinical diagnosis and management of ND. Therefore, we have performed a retrospective evaluation of WES application in 209 cases referred to the Department of Biochemistry and Molecular Genetics of the Hospital Clinic of Barcelona for WES sequencing derived from neurologists or clinical geneticists. In addition, we have further discussed some important facts regarding classification criteria for pathogenicity of rare variants, variants of unknown significance, deleterious variants, different clinical phenotypes, or frequency of actionable secondary findings. Different studies have shown that WES implementation establish diagnostic rate around 32% in ND and the continuous molecular diagnosis is essential to solve the remaining cases.


Subject(s)
Epilepsy , Exome , Humans , Exome Sequencing , Retrospective Studies , Exome/genetics , Phenotype , Epilepsy/diagnosis , Epilepsy/genetics
3.
J Eur Acad Dermatol Venereol ; 37(5): 914-921, 2023 May.
Article in English | MEDLINE | ID: mdl-36695073

ABSTRACT

BACKGROUND: Blue nevi are benign dermal melanocytic proliferations that are often easy to recognize clinically. Rarely, these lesions can display atypical features, suggesting the presence of a malignant blue nevus or mimicking cutaneous metastases of melanoma. OBJECTIVE: To describe the clinical evolution of blue nevi over time and to assess the need for monitoring these lesions. METHODS: We conducted a retrospective cohort study of 103 patients who were followed between December 1998 and November 2019. An artificial intelligence algorithm was used to identify blue nevi from the databases of two digital epiluminescence devices. Changes in the area of each lesion were calculated with a segmentation neural network. RESULTS: We included 123 blue nevi from 103 patients. Most of the lesions segmented, 99 (91.7%), were considered stable. Of the 9 (8.3%) growing blue nevi identified, 2 (1.85%) showed significant growth. The studied growing blue nevi turned out to be cellular blue nevi, presented with a low tumour mutation burden and GNAQ c.626A>T alteration was identified in both lesions. LIMITATIONS: Some clinical variants of blue nevi might not be included. CONCLUSIONS: Most blue nevi remain stable during their evolution. Rarely, they can show progressive growth, although histopathological or molecular signs of malignancy have not been identified.


Subject(s)
Melanoma , Nevus, Blue , Skin Neoplasms , Humans , Nevus, Blue/pathology , Retrospective Studies , Artificial Intelligence , Melanoma/pathology , Skin Neoplasms/pathology
4.
J Clin Med ; 11(13)2022 Jun 21.
Article in English | MEDLINE | ID: mdl-35806855

ABSTRACT

Lethal congenital contracture syndrome 11 (LCCS11) is caused by homozygous or compound heterozygous variants in the GLDN gene on chromosome 15q21. GLDN encodes gliomedin, a protein required for the formation of the nodes of Ranvier and development of the human peripheral nervous system. We report a fetus with ultrasound alterations detected at 28 weeks of gestation. The fetus exhibited hydrops, short long bones, fixed limb joints, absent fetal movements, and polyhydramnios. The pregnancy was terminated and postmortem studies confirmed the prenatal findings: distal arthrogryposis, fetal growth restriction, pulmonary hypoplasia, and retrognathia. The fetus had a normal chromosomal microarray analysis. Exome sequencing revealed two novel compound heterozygous variants in the GLDN associated with LCCS11. This manuscript reports this case and performs a literature review of all published LCCS11 cases.

5.
PeerJ ; 9: e12395, 2021.
Article in English | MEDLINE | ID: mdl-34820176

ABSTRACT

The aim of this study was to generate and analyze the atlas of the loggerhead turtle blood transcriptome by RNA-seq, as well as identify and characterize thioredoxin (Tnxs) and peroxiredoxin (Prdxs) antioxidant enzymes of the greatest interest in the control of peroxide levels and other biological functions. The transcriptome of loggerhead turtle was sequenced using the Illumina Hiseq 2000 platform and de novo assembly was performed using the Trinity pipeline. The assembly comprised 515,597 contigs with an N50 of 2,631 bp. Contigs were analyzed with CD-Hit obtaining 374,545 unigenes, of which 165,676 had ORFs encoding putative proteins longer than 100 amino acids. A total of 52,147 (31.5%) of these transcripts had significant homology matches in at least one of the five databases used. From the enrichment of GO terms, 180 proteins with antioxidant activity were identified, among these 28 Prdxs and 50 putative Tnxs. The putative proteins of loggerhead turtles encoded by the genes Prdx1, Prdx3, Prdx5, Prdx6, Txn and Txnip were predicted and characterized in silico. When comparing Prdxs and Txns of loggerhead turtle with homologous human proteins, they showed 18 (9%), 52 (18%) 94 (43%), 36 (16%), 35 (33%) and 74 (19%) amino acid mutations respectively. However, they showed high conservation in active sites and structural motifs (98%), with few specific modifications. Of these, Prdx1, Prdx3, Prdx5, Prdx6, Txn and Txnip presented 0, 25, 18, three, six and two deleterious changes. This study provides a high quality blood transcriptome and functional annotation of loggerhead sea turtles.

6.
PLoS One ; 16(7): e0252509, 2021.
Article in English | MEDLINE | ID: mdl-34260637

ABSTRACT

The current global pandemic due to the SARS-CoV-2 has pushed the limits of global health systems across all aspects of clinical care, including laboratory diagnostics. Supply chain disruptions and rapidly-shifting markets have resulted in flash-scarcity of commercial laboratory reagents; this has motivated health care providers to search for alternative workflows to cope with the international increase in demand for SARS-CoV-2 testing. The aim of this study is to present a reproducible workflow for real time RT-PCR SARS-CoV-2 testing using OT-2 open-source liquid-handling robots (Opentrons, NY). We have developed a framework that includes a code template which is helpful for building different stand-alone robotic stations, capable of performing specific protocols. Such stations can be combined together to create a complex multi-stage workflow, from sample setup to real time RT-PCR. Using our open-source code, it is easy to create new stations or workflows from scratch, adapt existing templates to update the experimental protocols, or to fine-tune the code to fit specific needs. Using this framework, we developed the code for two different workflows and evaluated them using external quality assessment (EQA) samples from the European Molecular Genetics Quality Network (EMQN). The affordability of this platform makes automated SARS-CoV-2 PCR testing accessible for most laboratories and hospitals with qualified bioinformatics personnel. This platform also allows for flexibility, as it is not dependent on any specific commercial kit, and thus it can be quickly adapted to protocol changes, reagent, consumable shortages, or any other temporary material constraints.


Subject(s)
COVID-19 Nucleic Acid Testing/instrumentation , SARS-CoV-2/isolation & purification , Clinical Coding , Early Diagnosis , Humans , RNA, Viral/genetics , Real-Time Polymerase Chain Reaction/instrumentation , Reverse Transcriptase Polymerase Chain Reaction/instrumentation , Robotics , SARS-CoV-2/genetics , Workflow
7.
Nat Commun ; 12(1): 604, 2021 01 27.
Article in English | MEDLINE | ID: mdl-33504782

ABSTRACT

De novo gene origination has been recently established as an important mechanism for the formation of new genes. In organisms with a large genome, intergenic and intronic regions provide plenty of raw material for new transcriptional events to occur, but little is know about how de novo transcripts originate in more densely-packed genomes. Here, we identify 213 de novo originated transcripts in Saccharomyces cerevisiae using deep transcriptomics and genomic synteny information from multiple yeast species grown in two different conditions. We find that about half of the de novo transcripts are expressed from regions which already harbor other genes in the opposite orientation; these transcripts show similar expression changes in response to stress as their overlapping counterparts, and some appear to translate small proteins. Thus, a large fraction of de novo genes in yeast are likely to co-evolve with already existing genes.


Subject(s)
Genes, Fungal , Saccharomyces cerevisiae/genetics , Transcriptome/genetics , Conserved Sequence/genetics , Gene Expression Regulation, Fungal , Gene Regulatory Networks , Open Reading Frames/genetics , RNA, Messenger/genetics , RNA, Messenger/metabolism , Saccharomyces cerevisiae Proteins/genetics , Saccharomyces cerevisiae Proteins/metabolism
8.
Exp Cell Res ; 391(1): 111940, 2020 06 01.
Article in English | MEDLINE | ID: mdl-32156600

ABSTRACT

High throughput RNA sequencing techniques have revealed that a large fraction of the genome is transcribed into long non-coding RNAs (lncRNAs). Unlike canonical protein-coding genes, lncRNAs do not contain long open reading frames (ORFs) and tend to be poorly conserved across species. However, many of them contain small ORFs (sORFs) that exhibit translation signatures according to ribosome profiling or proteomics data. These sORFs are a source of putative novel proteins; some of them may confer a selective advantage and be maintained over time, a process known as de novo gene birth. Here we review the mechanisms by which randomly occurring sORFs in lncRNAs can become new functional proteins.


Subject(s)
Evolution, Molecular , Genome , Open Reading Frames , Protein Biosynthesis , RNA, Long Noncoding/genetics , Ribosomes/genetics , Animals , Brain/metabolism , Humans , Liver/metabolism , Male , Molecular Sequence Annotation , Myocardium/metabolism , Organ Specificity , RNA, Long Noncoding/classification , RNA, Long Noncoding/metabolism , Ribosomes/classification , Ribosomes/metabolism , Testis/metabolism , Transcription, Genetic
9.
Cancers (Basel) ; 11(9)2019 Sep 10.
Article in English | MEDLINE | ID: mdl-31510016

ABSTRACT

The growth of cancer cells as oncospheres in three-dimensional (3D) culture provides a robust cell model for understanding cancer progression, as well as for early drug discovery and validation. We have previously described a novel pathway in breast cancer cells, whereby ADP (Adenosine diphosphate)-ribose derived from hydrolysis of poly (ADP-Ribose) and pyrophosphate (PPi) are converted to ATP, catalysed by the enzyme NUDT5 (nucleotide diphosphate hydrolase type 5). Overexpression of the NUDT5 gene in breast and other cancer types is associated with poor prognosis, increased risk of recurrence and metastasis. In order to understand the role of NUDT5 in cancer cell growth, we performed phenotypic and global expression analysis in breast cancer cells grown as oncospheres. Comparison of two-dimensional (2D) versus 3D cancer cell cultures from different tissues of origin suggest that NUDT5 increases the aggressiveness of the disease via the modulation of several key driver genes, including ubiquitin specific peptidase 22 (USP22), RAB35B, focadhesin (FOCAD) and prostagladin E synthase (PTGES). NUDT5 functions as a master regulator of key oncogenic pathways and of genes involved in cell adhesion, cancer stem cell (CSC) maintenance and epithelial to mesenchyme transition (EMT). Inhibiting the enzymatic activities of NUDT5 prevents oncosphere formation and precludes the activation of cancer driver genes. These findings highlight NUDT5 as an upstream regulator of tumour drivers and may provide a biomarker for cancer stratification, as well as a novel target for drug discovery for combinatorial drug regimens for the treatment of aggressive cancer types and metastasis.

10.
Nucleic Acids Res ; 47(13): 6842-6857, 2019 07 26.
Article in English | MEDLINE | ID: mdl-31175824

ABSTRACT

Although transposable elements are an important source of regulatory variation, their genome-wide contribution to the transcriptional regulation of stress-response genes has not been studied yet. Stress is a major aspect of natural selection in the wild, leading to changes in the transcriptional regulation of a variety of genes that are often triggered by one or a few transcription factors. In this work, we take advantage of the wealth of information available for Drosophila melanogaster and humans to analyze the role of transposable elements in six stress regulatory networks: immune, hypoxia, oxidative, xenobiotic, heat shock, and heavy metal. We found that transposable elements were enriched for caudal, dorsal, HSF, and tango binding sites in D. melanogaster and for NFE2L2 binding sites in humans. Taking into account the D. melanogaster population frequencies of transposable elements with predicted binding motifs and/or binding sites, we showed that those containing three or more binding motifs/sites are more likely to be functional. For a representative subset of these TEs, we performed in vivo transgenic reporter assays in different stress conditions. Overall, our results showed that TEs are relevant contributors to the transcriptional regulation of stress-response genes.


Subject(s)
DNA Transposable Elements/genetics , Drosophila Proteins/metabolism , Drosophila melanogaster/genetics , Gene Expression Regulation/genetics , Genes, Insect , Stress, Physiological/genetics , Transcription, Genetic/genetics , Amino Acid Motifs , Animals , Animals, Genetically Modified , Aryl Hydrocarbon Receptor Nuclear Translocator/metabolism , Binding Sites , Chromatin Immunoprecipitation , Drosophila Proteins/genetics , Drosophila melanogaster/drug effects , Drosophila melanogaster/embryology , Drosophila melanogaster/immunology , Female , Gene Regulatory Networks , Humans , NF-E2-Related Factor 2/metabolism , Protein Binding , Species Specificity , Transcription Factors/metabolism
11.
PLoS Genet ; 15(2): e1007900, 2019 02.
Article in English | MEDLINE | ID: mdl-30753202

ABSTRACT

Most of the current knowledge on the genetic basis of adaptive evolution is based on the analysis of single nucleotide polymorphisms (SNPs). Despite increasing evidence for their causal role, the contribution of structural variants to adaptive evolution remains largely unexplored. In this work, we analyzed the population frequencies of 1,615 Transposable Element (TE) insertions annotated in the reference genome of Drosophila melanogaster, in 91 samples from 60 worldwide natural populations. We identified a set of 300 polymorphic TEs that are present at high population frequencies, and located in genomic regions with high recombination rate, where the efficiency of natural selection is high. The age and the length of these 300 TEs are consistent with relatively young and long insertions reaching high frequencies due to the action of positive selection. Besides, we identified a set of 21 fixed TEs also likely to be adaptive. Indeed, we, and others, found evidence of selection for 84 of these reference TE insertions. The analysis of the genes located nearby these 84 candidate adaptive insertions suggested that the functional response to selection is related with the GO categories of response to stimulus, behavior, and development. We further showed that a subset of the candidate adaptive TEs affects expression of nearby genes, and five of them have already been linked to an ecologically relevant phenotypic effect. Our results provide a more complete understanding of the genetic variation and the fitness-related traits relevant for adaptive evolution. Similar studies should help uncover the importance of TE-induced adaptive mutations in other species as well.


Subject(s)
Behavior, Animal/physiology , DNA Transposable Elements/genetics , Drosophila melanogaster/genetics , Gene Expression Regulation, Developmental/genetics , Genome, Insect/genetics , Mutation/genetics , Stress, Physiological/genetics , Animals , Evolution, Molecular , Gene Frequency/genetics , Polymorphism, Single Nucleotide/genetics , Selection, Genetic/genetics
12.
Genome Res ; 29(1): 29-39, 2019 01.
Article in English | MEDLINE | ID: mdl-30552103

ABSTRACT

In breast cancer cells, some topologically associating domains (TADs) behave as hormonal gene regulation units, within which gene transcription is coordinately regulated in response to steroid hormones. Here we further describe that responsive TADs contain 20- to 100-kb-long clusters of intermingled estrogen receptor (ESR1) and progesterone receptor (PGR) binding sites, hereafter called hormone-control regions (HCRs). In T47D cells, we identified more than 200 HCRs, which are frequently bound by unliganded ESR1 and PGR. These HCRs establish steady long-distance inter-TAD interactions between them and organize characteristic looping structures with promoters in their TADs even in the absence of hormones in ESR1+-PGR+ cells. This organization is dependent on the expression of the receptors and is further dynamically modulated in response to steroid hormones. HCRs function as platforms that integrate different signals, resulting in some cases in opposite transcriptional responses to estrogens or progestins. Altogether, these results suggest that steroid hormone receptors act not only as hormone-regulated sequence-specific transcription factors but also as local and global genome organizers.


Subject(s)
Estrogen Receptor alpha/biosynthesis , Estrogens/pharmacology , Gene Expression Regulation/drug effects , Progesterone/pharmacology , Receptors, Progesterone/biosynthesis , Response Elements , Signal Transduction/drug effects , Estrogen Receptor alpha/genetics , Humans , MCF-7 Cells , Receptors, Progesterone/genetics
13.
Nat Ecol Evol ; 2(5): 890-896, 2018 05.
Article in English | MEDLINE | ID: mdl-29556078

ABSTRACT

Accumulating evidence indicates that some protein-coding genes have originated de novo from previously non-coding genomic sequences. However, the processes underlying de novo gene birth are still enigmatic. In particular, the appearance of a new functional protein seems highly improbable unless there is already a pool of neutrally evolving peptides that are translated at significant levels and that can at some point acquire new functions. Here, we use deep ribosome-profiling sequencing data, together with proteomics and single nucleotide polymorphism information, to search for these peptides. We find hundreds of open reading frames that are translated and that show no evolutionary conservation or selective constraints. These data suggest that the translation of these neutrally evolving peptides may be facilitated by the chance occurrence of open reading frames with a favourable codon composition. We conclude that the pervasive translation of the transcriptome provides plenty of material for the evolution of new functional proteins.


Subject(s)
Evolution, Molecular , Peptides/chemistry , Polymorphism, Single Nucleotide , Ribosomes/chemistry , Animals , High-Throughput Nucleotide Sequencing , Humans , Mice , Mice, Inbred BALB C , Proteomics
14.
Mol Ecol ; 27(3): 709-722, 2018 02.
Article in English | MEDLINE | ID: mdl-29319912

ABSTRACT

Hibernation is an adaptive strategy some mammals use to survive highly seasonal or unpredictable environments. We present the first investigation on the transcriptomics of hibernation in a natural population of primate hibernators: Crossley's dwarf lemurs (Cheirogaleus crossleyi). Using capture-mark-recapture techniques to track the same animals over a period of 7 months in Madagascar, we used RNA-seq to compare gene expression profiles in white adipose tissue (WAT) during three distinct physiological states. We focus on pathway analysis to assess the biological significance of transcriptional changes in dwarf lemur WAT and, by comparing and contrasting what is known in other model hibernating species, contribute to a broader understanding of genomic contributions of hibernation across Mammalia. The hibernation signature is characterized by a suppression of lipid biosynthesis, pyruvate metabolism and mitochondrial-associated functions, and an accumulation of transcripts encoding ribosomal components and iron-storage proteins. The data support a key role of pyruvate dehydrogenase kinase isoenzyme 4 (PDK4) in regulating the shift in fuel economy during periods of severe food deprivation. This pattern of PDK4 holds true across representative hibernating species from disparate mammalian groups, suggesting that the genetic underpinnings of hibernation may be ancestral to mammals.


Subject(s)
Animals, Wild/genetics , Animals, Wild/physiology , Cheirogaleidae/genetics , Cheirogaleidae/physiology , Hibernation/genetics , Transcriptome/genetics , Animals , Body Temperature , Carbohydrate Metabolism/genetics , Gene Expression Profiling , Iron/metabolism , Lipid Metabolism/genetics , Mitochondria/metabolism , Protein Biosynthesis/genetics , RNA, Messenger/genetics , RNA, Messenger/metabolism
15.
Genome Biol Evol ; 8(8): 2413-26, 2016 Aug 25.
Article in English | MEDLINE | ID: mdl-27412611

ABSTRACT

Hibernation is a complex physiological response that some mammalian species employ to evade energetic demands. Previous work in mammalian hibernators suggests that hibernation is activated not by a set of genes unique to hibernators, but by differential expression of genes that are present in all mammals. This question of universal genetic mechanisms requires further investigation and can only be tested through additional investigations of phylogenetically dispersed species. To explore this question, we use RNA-Seq to investigate gene expression dynamics as they relate to the varying physiological states experienced throughout the year in a group of primate hibernators-Madagascar's dwarf lemurs (genus Cheirogaleus). In a novel experimental approach, we use longitudinal sampling of biological tissues as a method for capturing gene expression profiles from the same individuals throughout their annual hibernation cycle. We identify 90 candidate genes that have variable expression patterns when comparing two active states (Active 1 and Active 2) with a torpor state. These include genes that are involved in metabolic pathways, feeding behavior, and circadian rhythms, as might be expected to correlate with seasonal physiological state changes. The identified genes appear to be critical for maintaining the health of an animal that undergoes prolonged periods of metabolic depression concurrent with the hibernation phenotype. By focusing on these differentially expressed genes in dwarf lemurs, we compare gene expression patterns in previously studied mammalian hibernators. Additionally, by employing evolutionary rate analysis, we find that hibernation-related genes do not evolve under positive selection in hibernating species relative to nonhibernators.


Subject(s)
Cheirogaleidae/genetics , Gene Expression Regulation/genetics , Hibernation/genetics , Phylogeny , Animals , Gene Expression Profiling , Madagascar , Mammals/genetics , Microarray Analysis , Protein Biosynthesis/genetics , RNA , Seasons
16.
Integr Comp Biol ; 54(3): 452-62, 2014 Sep.
Article in English | MEDLINE | ID: mdl-24881044

ABSTRACT

In recent years, the study of the molecular processes involved in mammalian hibernation has shifted from investigating a few carefully selected candidate genes to large-scale analysis of differential gene expression. The availability of high-throughput data provides an unprecedented opportunity to ask whether phylogenetically distant species show similar mechanisms of genetic control, and how these relate to particular genes and pathways involved in the hibernation phenotype. In order to address these questions, we compare 11 datasets of differentially expressed (DE) genes from two ground squirrel species, one bat species, and the American black bear, as well as a list of genes extracted from the literature that previously have been correlated with the drastic physiological changes associated with hibernation. We identify several genes that are DE in different species, indicating either ancestral adaptations or evolutionary convergence. When we use a network approach to expand the original datasets of DE genes to large gene networks using available interactome data, a higher agreement between datasets is achieved. This indicates that the same key pathways are important for activating and maintaining the hibernation phenotype. Functional-term-enrichment analysis identifies several important metabolic and mitochondrial processes that are critical for hibernation, such as fatty acid beta-oxidation and mitochondrial transport. We do not detect any enrichment of positive selection signatures in the coding sequences of genes from the networks of hibernation-associated genes, supporting the hypothesis that the genetic processes shaping the hibernation phenotype are driven primarily by changes in gene regulation.


Subject(s)
Gene Expression Regulation/physiology , Gene Regulatory Networks/physiology , Genomics/methods , Mammals/genetics , Mammals/physiology , Animals , Energy Metabolism/genetics , Energy Metabolism/physiology , Phylogeny , Species Specificity
17.
Genome Biol Evol ; 5(2): 457-67, 2013.
Article in English | MEDLINE | ID: mdl-23377868

ABSTRACT

Large-scale evolutionary studies often require the automated construction of alignments of a large number of homologous gene families. The majority of eukaryotic genes can produce different transcripts due to alternative splicing or transcription initiation, and many such transcripts encode different protein isoforms. As analyses tend to be gene centered, one single-protein isoform per gene is selected for the alignment, with the de facto approach being to use the longest protein isoform per gene (Longest), presumably to avoid including partial sequences and to maximize sequence information. Here, we show that this approach is problematic because it increases the number of indels in the alignments due to the inclusion of nonhomologous regions, such as those derived from species-specific exons, increasing the number of misaligned positions. With the aim of ameliorating this problem, we have developed a novel heuristic, Protein ALignment Optimizer (PALO), which, for each gene family, selects the combination of protein isoforms that are most similar in length. We examine several evolutionary parameters inferred from alignments in which the only difference is the method used to select the protein isoform combination: Longest, PALO, the combination that results in the highest sequence conservation, and a randomly selected combination. We observe that Longest tends to overestimate both nonsynonymous and synonymous substitution rates when compared with PALO, which is most likely due to an excess of misaligned positions. The estimation of the fraction of genes that have experienced positive selection by maximum likelihood is very sensitive to the method of isoform selection employed, both when alignments are constructed with MAFFT and with Prank(+F). Longest performs better than a random combination but still estimates up to 3 times more positively selected genes than the combination showing the highest conservation, indicating the presence of many false positives. We show that PALO can eliminate the majority of such false positives and thus that it is a more appropriate approach for large-scale analyses than Longest. A web server has been set up to facilitate the use of PALO given a user-defined set of gene families; it is available at http://evolutionarygenomics.imim.es/palo.


Subject(s)
Evolution, Molecular , Selection, Genetic/genetics , Sequence Alignment/methods , Sequence Analysis, DNA/methods , Algorithms , Conserved Sequence/genetics , Genome , Internet , Phylogeny , Protein Isoforms/genetics , Software , Species Specificity
SELECTION OF CITATIONS
SEARCH DETAIL
...